Efficient Item Set Mining Supported by IMine Index

نویسنده

  • Bharathi
چکیده

This paper presents the IMine index, a general and compact structure which provides tight integration of item set extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different item set extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v.2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large data sets. Item set mining supported by the IMine index shows performance always comparable with, and often (especially for low supports) better than, state-of-the-art algorithms accessing data on flat file. Index Terms — Data mining, item set extraction, indexing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IMine: Index Support for Item Set Mining in Item Set Extraction

Relational database management systems manages the tables with predefined indexes. In RDBMS indexes are created on the basis of attribute values. The Imine indexing scheme is used to index item sets in relational databases. The index creation is performed with out any constraints. IMine provides a complete representation of the original database. The Imine indexing scheme reduces the input/outp...

متن کامل

IMine: Index Support for Item Set Mining in Item Set Extraction

Relational database management systems manages the tables with predefined indexes. In RDBMS indexes are created on the basis of attribute values. The Imine indexing scheme is used to index item sets in relational databases. The index creation is performed with out any constraints. IMine provides a complete representation of the original database. The Imine indexing scheme reduces the input/outp...

متن کامل

State of Art of Multi Relational Data Mining Approaches: A Rule Mining Algorithm

In this 21 century is completely called as the information science where the large organizations need useful knowledge. The data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multi-relational data mining (MRDM) approaches look for patterns that involve multiple tables (relations) from a relational database. The ...

متن کامل

Cover Similarity Based Item Set Mining

In standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions. We, instead, strive to find item sets for which the similarity of the covers of the items (that is, the sets of transactions containing the items) exceeds a user-defined threshold. This approach yields a much better assessment ...

متن کامل

Indexed Enhancement on GenMax Algorithm for Fast and Less Memory Utilized Pruning of MFI and CFI

The essential problem in many data mining applications is mining frequent item sets such as the discovery of association rules, patterns, and many other important discovery tasks. Fast and less memory utilization for solving the problems of frequent item sets are highly required in transactional databases. Methods for mining frequent item sets have been implemented using a prefix-tree structure...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011